{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas字符串操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "前面我们已经使用了字符串的处理函数：<br>\n",
    "df[\"气温(度)\"].str.replace(\"℃\",\"\").astype(\"float\")\n",
    "\n",
    "***Pandas的字符串处理:***\n",
    "1. 使用方法: 先获取Series的str属性，然后再属性上调用函数;\n",
    "2. 只能再字符串列上使用，不能再数字列上使用;\n",
    "3. DataFrame上没有str属性和处理方法;\n",
    "4. Series.str并不是Python原生字符串，而是自己的一套方法，不过大部分和原生str很相似;\n",
    "\n",
    "***Series.str字符串方法列表参考文档:***<br>\n",
    "https://pandas.pydata.org/docs/reference/series.html\n",
    "\n",
    "***本节演示内容:***\n",
    "1. 获取Series的str属性，然后使用各种字符串处理函数\n",
    "2. 使用str的startswith、contains等boolean类Series可以做条件查询\n",
    "3. 需要多次str处理的链式操作\n",
    "4. 使用正则表达式处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. 读取天气数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 0. 读取数据\n",
    "fpath=\"datas/weather_20230115134249.csv\"\n",
    "df = pd.read_csv(fpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>日期</th>\n",
       "      <th>城市</th>\n",
       "      <th>行政区</th>\n",
       "      <th>观测站</th>\n",
       "      <th>气温(度)</th>\n",
       "      <th>相对湿度(%)</th>\n",
       "      <th>累积雨量(mm)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-01-01</td>\n",
       "      <td>新北市</td>\n",
       "      <td>烏來區</td>\n",
       "      <td>福山</td>\n",
       "      <td>13.7℃</td>\n",
       "      <td>92</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-01-02</td>\n",
       "      <td>臺南市</td>\n",
       "      <td>安平區</td>\n",
       "      <td>安平</td>\n",
       "      <td>23.5℃</td>\n",
       "      <td>70</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-01-03</td>\n",
       "      <td>臺東縣</td>\n",
       "      <td>東河鄉</td>\n",
       "      <td>七塊厝</td>\n",
       "      <td>19.6℃</td>\n",
       "      <td>86</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-01-04</td>\n",
       "      <td>新北市</td>\n",
       "      <td>貢寮區</td>\n",
       "      <td>福隆</td>\n",
       "      <td>14.2℃</td>\n",
       "      <td>96</td>\n",
       "      <td>-99.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>南投縣</td>\n",
       "      <td>仁愛鄉</td>\n",
       "      <td>小奇萊</td>\n",
       "      <td>8.3℃</td>\n",
       "      <td>57</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           日期   城市  行政区  观测站  气温(度)  相对湿度(%)  累积雨量(mm)\n",
       "0  2015-01-01  新北市  烏來區   福山  13.7℃       92       0.0\n",
       "1  2015-01-02  臺南市  安平區   安平  23.5℃       70       0.0\n",
       "2  2015-01-03  臺東縣  東河鄉  七塊厝  19.6℃       86       0.0\n",
       "3  2015-01-04  新北市  貢寮區   福隆  14.2℃       96     -99.0\n",
       "4  2015-01-05  南投縣  仁愛鄉  小奇萊   8.3℃       57       0.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "日期           object\n",
       "城市           object\n",
       "行政区          object\n",
       "观测站          object\n",
       "气温(度)        object\n",
       "相对湿度(%)       int64\n",
       "累积雨量(mm)    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 获取Series的str属性，使用各种字符串处理函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.strings.StringMethods at 0x1527c341a58>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"气温(度)\"].str"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    13.7\n",
       "1    23.5\n",
       "2    19.6\n",
       "3    14.2\n",
       "4     8.3\n",
       "Name: 气温(度), dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"气温(度)\"].str.replace(\"℃\",\"\").head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4    False\n",
       "Name: 气温(度), dtype: bool"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 判断是不是数字\n",
    "df[\"气温(度)\"].str.isnumeric().head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "Can only use .str accessor with string values, which use np.object_ dtype in pandas",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-8-6f500318aad0>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"相对湿度(%)\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mstr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32mD:\\Anaconda\\lib\\site-packages\\pandas\\core\\generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(self, name)\u001b[0m\n\u001b[0;32m   4366\u001b[0m         if (name in self._internal_names_set or name in self._metadata or\n\u001b[0;32m   4367\u001b[0m                 name in self._accessors):\n\u001b[1;32m-> 4368\u001b[1;33m             \u001b[1;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   4369\u001b[0m         \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   4370\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mD:\\Anaconda\\lib\\site-packages\\pandas\\core\\accessor.py\u001b[0m in \u001b[0;36m__get__\u001b[1;34m(self, obj, cls)\u001b[0m\n\u001b[0;32m    130\u001b[0m             \u001b[1;31m# we're accessing the attribute of the class, i.e., Dataset.geo\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    131\u001b[0m             \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_accessor\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 132\u001b[1;33m         \u001b[0maccessor_obj\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_accessor\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    133\u001b[0m         \u001b[1;31m# Replace the property with the accessor object. Inspired by:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    134\u001b[0m         \u001b[1;31m# http://www.pydanny.com/cached-property.html\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mD:\\Anaconda\\lib\\site-packages\\pandas\\core\\strings.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, data)\u001b[0m\n\u001b[0;32m   1893\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1894\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1895\u001b[1;33m         \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_validate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1896\u001b[0m         \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_is_categorical\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1897\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mD:\\Anaconda\\lib\\site-packages\\pandas\\core\\strings.py\u001b[0m in \u001b[0;36m_validate\u001b[1;34m(data)\u001b[0m\n\u001b[0;32m   1915\u001b[0m             \u001b[1;31m# (instead of test for object dtype), but that isn't practical for\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1916\u001b[0m             \u001b[1;31m# performance reasons until we have a str dtype (GH 9343)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1917\u001b[1;33m             raise AttributeError(\"Can only use .str accessor with string \"\n\u001b[0m\u001b[0;32m   1918\u001b[0m                                  \u001b[1;34m\"values, which use np.object_ dtype in \"\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1919\u001b[0m                                  \"pandas\")\n",
      "\u001b[1;31mAttributeError\u001b[0m: Can only use .str accessor with string values, which use np.object_ dtype in pandas"
     ]
    }
   ],
   "source": [
    "df[\"相对湿度(%)\"].str.len()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 使用str的startswith、contains等得到bool的Series可以做条件查询"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "condition=df[\"日期\"].str.startswith(\"2015-11\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4    False\n",
       "Name: 日期, dtype: bool"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "condition.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>日期</th>\n",
       "      <th>城市</th>\n",
       "      <th>行政区</th>\n",
       "      <th>观测站</th>\n",
       "      <th>气温(度)</th>\n",
       "      <th>相对湿度(%)</th>\n",
       "      <th>累积雨量(mm)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>304</th>\n",
       "      <td>2015-11-01</td>\n",
       "      <td>苗栗縣</td>\n",
       "      <td>苑裡鎮</td>\n",
       "      <td>苑裡</td>\n",
       "      <td>17.4℃</td>\n",
       "      <td>99</td>\n",
       "      <td>-99.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>305</th>\n",
       "      <td>2015-11-02</td>\n",
       "      <td>雲林縣</td>\n",
       "      <td>虎尾鎮</td>\n",
       "      <td>虎尾</td>\n",
       "      <td>22.6℃</td>\n",
       "      <td>63</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>306</th>\n",
       "      <td>2015-11-03</td>\n",
       "      <td>嘉義縣</td>\n",
       "      <td>溪口鄉</td>\n",
       "      <td>溪口</td>\n",
       "      <td>22.3℃</td>\n",
       "      <td>64</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>307</th>\n",
       "      <td>2015-11-04</td>\n",
       "      <td>新竹縣</td>\n",
       "      <td>寶山鄉</td>\n",
       "      <td>寶山</td>\n",
       "      <td>15.6℃</td>\n",
       "      <td>83</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>308</th>\n",
       "      <td>2015-11-05</td>\n",
       "      <td>屏東縣</td>\n",
       "      <td>東港鎮</td>\n",
       "      <td>東港</td>\n",
       "      <td>23.7℃</td>\n",
       "      <td>82</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             日期   城市  行政区 观测站  气温(度)  相对湿度(%)  累积雨量(mm)\n",
       "304  2015-11-01  苗栗縣  苑裡鎮  苑裡  17.4℃       99     -99.0\n",
       "305  2015-11-02  雲林縣  虎尾鎮  虎尾  22.6℃       63       0.0\n",
       "306  2015-11-03  嘉義縣  溪口鄉  溪口  22.3℃       64       0.0\n",
       "307  2015-11-04  新竹縣  寶山鄉  寶山  15.6℃       83       1.5\n",
       "308  2015-11-05  屏東縣  東港鎮  東港  23.7℃       82       0.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[condition].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 需要多次str处理的链式操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "怎么提取201811这样的数字月份？<br>\n",
    "1.现将日期2018-11-20替换成20181120的形式<br>\n",
    "2.提取月份字符串201811"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    20150101\n",
       "1    20150102\n",
       "2    20150103\n",
       "3    20150104\n",
       "4    20150105\n",
       "Name: 日期, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"日期\"].str.replace(\"-\",\"\").head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'Series' object has no attribute 'slice'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-13-f11e56c799cb>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# 每次调用函数，都返回一个新的Series\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"日期\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mstr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreplace\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"-\"\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m\"\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mslice\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m6\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32mD:\\Anaconda\\lib\\site-packages\\pandas\\core\\generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(self, name)\u001b[0m\n\u001b[0;32m   4370\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   4371\u001b[0m                 \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 4372\u001b[1;33m             \u001b[1;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   4373\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   4374\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mAttributeError\u001b[0m: 'Series' object has no attribute 'slice'"
     ]
    }
   ],
   "source": [
    "# 每次调用函数，都返回一个新的Series\n",
    "df[\"日期\"].str.replace(\"-\",\"\").slice(0,6).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    201501\n",
       "1    201501\n",
       "2    201501\n",
       "3    201501\n",
       "4    201501\n",
       "Name: 日期, dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"日期\"].str.replace(\"-\",\"\").str.slice(0,6).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    201501\n",
       "1    201501\n",
       "2    201501\n",
       "3    201501\n",
       "4    201501\n",
       "Name: 日期, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slice就是切片语法，可以直接用\n",
    "df[\"日期\"].str.replace(\"-\",\"\").str[0:6].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 使用正则表达式的处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 添加新列\n",
    "def get_nianyueri(x):\n",
    "    year,month,day = x[\"日期\"].split(\"-\")\n",
    "    return f\"{year}年{month}月{day}日\"\n",
    "df[\"中文日期\"] = df.apply(get_nianyueri,axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2015年01月01日\n",
       "1    2015年01月02日\n",
       "2    2015年01月03日\n",
       "3    2015年01月04日\n",
       "4    2015年01月05日\n",
       "Name: 中文日期, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"中文日期\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "问题：怎么将\"2015年11月20日\"中的年、月、日三个中文字符去除？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    20150101\n",
       "1    20150102\n",
       "2    20150103\n",
       "3    20150104\n",
       "4    20150105\n",
       "Name: 中文日期, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法1：链式replace\n",
    "df[\"中文日期\"].str.replace(\"年\",\"\").str.replace(\"月\",\"\").str.replace(\"日\",\"\").head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Series.str默认就开启了正则表达式模式***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2015年01月01日\n",
       "1    2015年01月02日\n",
       "2    2015年01月03日\n",
       "3    2015年01月04日\n",
       "4    2015年01月05日\n",
       "Name: 中文日期, dtype: object"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法二：正则表达式替换\n",
    "df[\"中文日期\"].str.replace(\"年月日\",\"\").head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
